Analysis Walkthrough Example
Getting Started
This walkthrough will cover how you can visualize data in map form! This includes Census data through the tidycensus package. We’ll go through customization as well.
To begin, we must load our packages.
Choosing Variables
We’ll be using the tidycensus package to pull both census data, as well as geospatial boundaries.
In order to access the data used in this walkthrough, we’ll need a Census API. You can learn how to install and use one here.
Now, we choose variables we want to use from the American Community Survey, conducted by the US Census Bureau. There are many to choose from, and we can look at them by using the load_variables() function.
I assigned it to a variable called acs. Since there are lots of variables, it’s helpful to view the entire acs dataframe and see the descriptions. We will pull total population (assigned to the totalpop variable), median household income (assigned to medincome), and medage (median age).
The c() function creates a vector with these variable names, and we are assigning it to the myvars variable.
Code
#chose variables we want
myvars <- c(totalpop = "B01003_001",
medincome = "B19013_001",
medage = "B01002_001"
)Creating a New Dataframe
Now, we pull the information for GA counties. To do so, we use the get_acs() function. The arguments are as follows:
- geography = “county”: we pull data for each county
- variables = c(myvars): we use the variables we pulled previously (medincome, totalpop, medage) in our dataframe
- state = “GA”: We are pulling state data for GA
- output = “wide”: This makes data easier to read by pivoting wide
- geometry = TRUE: This includes all shapefile data necessary to make a map
We’re assigning this to ga_counties_withgeo
Code
#pull for GA counties
ga_counties_withgeo <- get_acs(geography = "county",
variables = c(myvars),
state = "GA",
output = "wide",
geometry = TRUE)Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============== | 20%
|
|================= | 24%
|
|=================== | 27%
|
|====================== | 31%
|
|======================== | 34%
|
|=========================== | 38%
|
|============================= | 41%
|
|=============================== | 45%
|
|================================== | 48%
|
|==================================== | 52%
|
|======================================= | 55%
|
|========================================= | 59%
|
|============================================ | 62%
|
|============================================== | 66%
|
|================================================ | 69%
|
|=================================================== | 73%
|
|===================================================== | 76%
|
|======================================================== | 80%
|
|========================================================== | 83%
|
|============================================================= | 87%
|
|=============================================================== | 90%
|
|================================================================== | 94%
|
|==================================================================== | 97%
|
|======================================================================| 100%
Code
ga_counties_withgeoSimple feature collection with 159 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -85.60516 ymin: 30.35785 xmax: -80.84038 ymax: 35.00124
Geodetic CRS: NAD83
First 10 features:
GEOID NAME totalpopE totalpopM medincomeE medincomeM
1 13021 Bibb County, Georgia 156711 NA 43862 1778
2 13049 Charlton County, Georgia 12416 NA 45494 5791
3 13283 Treutlen County, Georgia 6410 NA 35441 9710
4 13309 Wheeler County, Georgia 7568 NA 26776 3605
5 13279 Toombs County, Georgia 26956 NA 42975 3095
6 13077 Coweta County, Georgia 144928 NA 83486 2974
7 13153 Houston County, Georgia 161177 NA 70313 3057
8 13183 Long County, Georgia 16398 NA 52742 8858
9 13163 Jefferson County, Georgia 15708 NA 42238 4150
10 13261 Sumter County, Georgia 29690 NA 36687 2163
medageE medageM geometry
1 36.2 0.3 MULTIPOLYGON (((-83.89192 3...
2 40.6 1.5 MULTIPOLYGON (((-82.4156 31...
3 39.9 5.3 MULTIPOLYGON (((-82.74762 3...
4 33.6 10.0 MULTIPOLYGON (((-82.93976 3...
5 37.8 0.9 MULTIPOLYGON (((-82.48038 3...
6 38.9 0.3 MULTIPOLYGON (((-85.0132 33...
7 35.9 0.3 MULTIPOLYGON (((-83.85685 3...
8 33.7 0.8 MULTIPOLYGON (((-81.98162 3...
9 40.5 0.8 MULTIPOLYGON (((-82.66192 3...
10 37.0 1.1 MULTIPOLYGON (((-84.44381 3...
We can also get all counties in the US, but be mindful that this would be a bit difficult to visualize on a map.
Code
#all counties in the US
all_counties_withgeo <- get_acs(geography = "county",
variables = c(myvars),
output = "wide",
geometry = TRUE)Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Code
head(all_counties_withgeo)Simple feature collection with 6 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -111.6345 ymin: 39.04326 xmax: -91.52874 ymax: 45.63864
Geodetic CRS: NAD83
GEOID NAME totalpopE totalpopM medincomeE medincomeM
1 20161 Riley County, Kansas 72602 NA 53296 2489
2 19159 Ringgold County, Iowa 4739 NA 57700 5058
3 30009 Carbon County, Montana 10488 NA 63178 4261
4 16007 Bear Lake County, Idaho 6327 NA 60337 7039
5 55011 Buffalo County, Wisconsin 13314 NA 61167 2352
6 31185 York County, Nebraska 14164 NA 66337 4128
medageE medageM geometry
1 25.5 0.1 MULTIPOLYGON (((-96.96095 3...
2 44.3 1.0 MULTIPOLYGON (((-94.47167 4...
3 50.7 0.9 MULTIPOLYGON (((-109.7987 4...
4 38.9 1.1 MULTIPOLYGON (((-111.6345 4...
5 46.5 0.5 MULTIPOLYGON (((-92.08384 4...
6 39.5 1.2 MULTIPOLYGON (((-97.82629 4...
As you can see in the results above, there are E and M columns. The ones ending in “M” are margin of error columns, which we do not need for this analysis. So, we shall remove the column with the select() function. The - symbol cuts columns, and the ends_with() function identifies those ending in “M”.
Code
#remove MOE columns - they all end with "M"
ga_counties_withgeo <- ga_counties_withgeo %>%
select(-ends_with("M"))
ga_counties_withgeoSimple feature collection with 159 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -85.60516 ymin: 30.35785 xmax: -80.84038 ymax: 35.00124
Geodetic CRS: NAD83
First 10 features:
GEOID NAME totalpopE medincomeE medageE
1 13021 Bibb County, Georgia 156711 43862 36.2
2 13049 Charlton County, Georgia 12416 45494 40.6
3 13283 Treutlen County, Georgia 6410 35441 39.9
4 13309 Wheeler County, Georgia 7568 26776 33.6
5 13279 Toombs County, Georgia 26956 42975 37.8
6 13077 Coweta County, Georgia 144928 83486 38.9
7 13153 Houston County, Georgia 161177 70313 35.9
8 13183 Long County, Georgia 16398 52742 33.7
9 13163 Jefferson County, Georgia 15708 42238 40.5
10 13261 Sumter County, Georgia 29690 36687 37.0
geometry
1 MULTIPOLYGON (((-83.89192 3...
2 MULTIPOLYGON (((-82.4156 31...
3 MULTIPOLYGON (((-82.74762 3...
4 MULTIPOLYGON (((-82.93976 3...
5 MULTIPOLYGON (((-82.48038 3...
6 MULTIPOLYGON (((-85.0132 33...
7 MULTIPOLYGON (((-83.85685 3...
8 MULTIPOLYGON (((-81.98162 3...
9 MULTIPOLYGON (((-82.66192 3...
10 MULTIPOLYGON (((-84.44381 3...
…we’ll also remove that trailing “E” from the estimate columns, which we will use for analysis. The sub function allows us to do so. E$ means the E at the end of the variable will be removed.
Code
#remove that trailing "E"
colnames(ga_counties_withgeo) <- sub("E$", "", colnames(ga_counties_withgeo)) # $ means end of string only
ga_counties_withgeoSimple feature collection with 159 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -85.60516 ymin: 30.35785 xmax: -80.84038 ymax: 35.00124
Geodetic CRS: NAD83
First 10 features:
GEOID NAM totalpop medincome medage
1 13021 Bibb County, Georgia 156711 43862 36.2
2 13049 Charlton County, Georgia 12416 45494 40.6
3 13283 Treutlen County, Georgia 6410 35441 39.9
4 13309 Wheeler County, Georgia 7568 26776 33.6
5 13279 Toombs County, Georgia 26956 42975 37.8
6 13077 Coweta County, Georgia 144928 83486 38.9
7 13153 Houston County, Georgia 161177 70313 35.9
8 13183 Long County, Georgia 16398 52742 33.7
9 13163 Jefferson County, Georgia 15708 42238 40.5
10 13261 Sumter County, Georgia 29690 36687 37.0
geometry
1 MULTIPOLYGON (((-83.89192 3...
2 MULTIPOLYGON (((-82.4156 31...
3 MULTIPOLYGON (((-82.74762 3...
4 MULTIPOLYGON (((-82.93976 3...
5 MULTIPOLYGON (((-82.48038 3...
6 MULTIPOLYGON (((-85.0132 33...
7 MULTIPOLYGON (((-83.85685 3...
8 MULTIPOLYGON (((-81.98162 3...
9 MULTIPOLYGON (((-82.66192 3...
10 MULTIPOLYGON (((-84.44381 3...
Mapping GA counties with Mapview
Our first simple maps use mapview(). It takes our dataframe (ga_counties_withgeo) and variables (zcol) as arguments. As you can see, the first map shows median income and the second shows median age in each GA county.
Code
mapview(ga_counties_withgeo, zcol = "medincome")Code
mapview(ga_counties_withgeo, zcol = "medage")Customizing
To jazz things up a bit, let’s change from the default theme. We do so by adding an argument called col.regions. This utilizes the RColorBrewer package, which houses different dicrete and continuous palettes. We are using the “Greens” palette. Below is a map showing median income with a different palette.
Code
mapview(ga_counties_withgeo, zcol = "medincome",
col.regions = RColorBrewer::brewer.pal(9, "Greens"),
alpha.regions = 1)Warning: Found less unique colors (9) than unique zcol values (159)!
Interpolating color vector to match number of zcol values.
This map’s dark background appeared automatically, because mapview determined the map included a lot of light colors. You can turn off that feature with the following code. It makes things easier to understand.
Code
mapviewOptions("basemaps.color.shuffle" = FALSE)Here’s a new map with the light background.
Code
mapview(ga_counties_withgeo, zcol = "medincome",
col.regions = RColorBrewer::brewer.pal(9, "Greens"),
alpha.regions = 1)Warning: Found less unique colors (9) than unique zcol values (159)!
Interpolating color vector to match number of zcol values.
We can also compare two maps at the same time! You’ll need to assign the code used to create the map to do this. map_income is our map of median household income in GA counties, while map_age is our map of median age in GA counties.
Code
map_income <- mapview(ga_counties_withgeo, zcol = "medincome",
col.regions = RColorBrewer::brewer.pal(9, "Greens"),
alpha.regions = 1)Warning: Found less unique colors (9) than unique zcol values (159)!
Interpolating color vector to match number of zcol values.
Code
map_age <- mapview(ga_counties_withgeo, zcol = "medage",
col.regions = RColorBrewer::brewer.pal(9, "Greens"),
alpha.regions = 1)Warning: Found less unique colors (9) than unique zcol values (97)!
Interpolating color vector to match number of zcol values.
The sync() function shows two maps together, like so:
Code
# two maps together
sync(map_income, map_age)We may also include a side-by-side slider by separating the map variables with the “|” symbol. This is from the leaflet.extras2 package.
Code
map_income | map_ageFinally, we can also turn off legends for a cleaner apperance. Make sure your map is interpretable without a legend, however. You want to include an accessible visualization in your projects!
Code
mapview(ga_counties_withgeo, zcol = "medincome",
col.regions = RColorBrewer::brewer.pal(9, "Greens"),
alpha.regions = 1,
legend = FALSE)Warning: Found less unique colors (9) than unique zcol values (159)!
Interpolating color vector to match number of zcol values.